Add BONSAI-style greedy pruning for NChooseK constraints by jduerholt · Pull Request #747 · experimental-design/bofire

jduerholt · 2026-03-26T21:16:19Z

Motivation

BONSAI for handling of NChooseK constraint (https://arxiv.org/abs/2602.07144).

@dlinzner-bcs: one could do the same in the DOE module.

Have you read the Contributing Guidelines on pull requests?

Yes.

Have you updated `CHANGELOG.md`?

Not yet.

Test Plan

Unit tests.

Implement greedy post-processing pruning based on the BONSAI algorithm (https://arxiv.org/abs/2602.07144) to handle NChooseK constraints more effectively. Instead of encoding NChooseK as nonlinear constraints during acquisition optimization, candidates are optimized without them and then pruned in post-processing by greedily zeroing the feature with smallest acquisition function impact until the constraint is satisfied. - Add Domain.is_nchoosek_pruning_applicable() to check if pruning is safe (no NChooseK feature appears in any other constraint) - Add exclude_nchoosek param to get_nonlinear_constraints() - Skip NChooseK in optimize_acqf when pruning is applicable - Override _postprocess_candidates in BotorchStrategy with Ax-style greedy pruning that conditions on already-pruned candidates for q>1 Co-Authored-By: Claude Opus 4.6 <[email protected]>

Relax is_nchoosek_pruning_applicable() to allow NChooseK features that also participate in linear equality/inequality constraints (only block for nonlinear/product/interpoint overlap). When linear overlap exists, pruning variants use QP projection (LinearProjection) to find a feasible warm-start with x_k=0, then local optimize_acqf to find the best achievable AF value — giving accurate greedy decisions. The simple zeroing path is preserved when no linear overlap exists. - Relax is_nchoosek_pruning_applicable: linear overlap now allowed - Add Domain.has_nchoosek_linear_overlap() to select pruning path - Add QP + local optimize_acqf variant construction in botorch.py - Add _build_pruning_domain to create modified domain for QP projection - Add end-to-end test with NChooseK + mixture constraint (sum=1) Co-Authored-By: Claude Opus 4.6 <[email protected]>

jduerholt · 2026-03-26T21:48:39Z

@Jimbo994, if you are interested ;)

Phase 1 of the BONSAI pruning refactor: a self-contained, tensor-native implementation of greedy NChooseK pruning that can later be wired into BotorchOptimizer. Strategy-side code in BotorchStrategy is untouched and will be migrated in Phase 2. The module extends today's pruning beyond what BotorchStrategy currently does: - active variant for semi-continuous features (allow_zero=True with lb > 0): each fractional value is resolved either to zero or snapped into [lb, ub] via QP projection + optional local optimize_acqf - multiple NChooseK constraints with per-constraint active counts and conjunction-style termination - per-constraint min_count guard with none_also_valid exception, raising PruningInfeasibleError when the action set empties before all max_count constraints are satisfied - caller-supplied fixed_features flow through to both QP projection and local re-opt, and are excluded from the action set so the loop never proposes to move them - two hyperparameters: per_step_local_reopt (refine each variant via optimize_acqf), final_local_reopt (single end-of-loop clean-up solve) - pinned-zero-indices machinery prevents the linear-redistribution from resurrecting previously-zeroed features across iterations The cvxpy LinearProjection round-trip is replaced by BoTorch's native project_to_feasible_space_via_slsqp, eliminating the numpy boundary in the pruning loop and dropping the bespoke pruning-domain construction. The full conceptual write-up lives in docs/pruning.md, with a worked 4-feature example and a comparison to the published BONSAI algorithm. Test coverage in tests/bofire/strategies/test_nchoosek_pruning.py (45 unit tests): pure helpers (classification, fulfilment, counts, guard, eligibility), variant construction (zero/active, no-overlap, mixture eq, inequality-only, infeasibility), end-to-end pruning (single/multiple NChooseK, semi-continuous, min_count guard, hyperparameter smoke tests, fixed-features regression, q=0, tie-break, QP-failure fallback).

Structural cleanup of the pruning module — no behavior change, all 118 + 6 new tests pass: - Add Action, ActionKind, PruningContext, PruningState dataclasses; replace anonymous tuples and stringly-typed action kinds. - Unify _build_zero_variant / _build_active_variant / _build_activate_variant behind a single _build_variant(kind, ...) dispatcher; keep the three legacy wrappers for backward compat. - Extract _collect_actions(state, ctx, fixed_keys) -> List[Action] to fold the three duplicated action-collection for-loops. - Extract _prune_single_candidate(x_i, X_prefix, ctx, fixed_keys) out of prune_nchoosek; the public function is now a 30-line outer loop and the inner per-candidate logic is a self-contained primitive that beam-search and B&B can build on directly. - Replace the manual argmin loop with min(..., key=...). - Drop the # noqa: C901 escape on prune_nchoosek. - Add 5 dataclass smoke tests + 1 q>1 prefix-conditioning regression test (the existing test_q2 only checked that both candidates ran, not that the second conditioned on the first's pruned form).

Pairs missing tests with the categorical/discrete pinning fix already landed in 8906fa0. The new TestPinnedColumns class covers the per-row resolution semantics (q=1 and q=2 with row-varying values), both reopt-path coverage, and the categorical one-hot integration test. Also migrates three legacy tests from fixed_features= to the new pinned_columns= API. Adds two TODO comments in _build_variant referencing facebook/Ax#5180: - Skip SLSQP projection when the action dim is not in any linear constraint (perf win). - Post-projection feasibility safety net (defensive). Both are documented but deferred -- small wins on a non-bottleneck path, revisit if profiling shows _build_variant is hot.

is_pruning_applicable only blocks pruning when NChooseK features *overlap* with an InterpointConstraint, NonlinearConstraint, or ProductConstraint. If those constraints exist on non-NChooseK features, the gate passes and pruning runs -- but the QP projection inside prune_nchoosek only sees Linear{Equality,Inequality} constraints (filtered by get_linear_constraints), so features participating in the other constraint types may drift during projection and silently break those constraints. Fix: in BotorchOptimizer._prune_if_applicable, extend pinned_columns to include every feature participating in those three unhandled constraint types. The candidate carries values satisfying these constraints at row entry (upstream optimizer respected them); pinning preserves them by inertia. Same defensive principle as the categorical/discrete fix in 8906fa0 -- anything pruning's QP cannot reason about, freeze at the per-row value. Adds a regression test asserting an InterpointEqualityConstraint feature survives pruning unchanged.

BotorchOptimizer.validate_domain already rejected LSR-BO combined with non-LinearConstraint via the existing 'LSR-BO only supported for linear constraints' check; since NChooseKConstraint is not a LinearConstraint subclass, that branch already caught NChooseK + LSR-BO at strategy-construction time. The only remaining case the validator missed was semi-continuous features (`allow_zero=True` with `bounds[0] > 0`) -- an input property, not a constraint. Extend validate_local_search_config to also reject semi-continuous features. With both cases now caught at data-model validation, the runtime NotImplementedError in BotorchOptimizer._optimize becomes redundant and is removed. User benefit: fail at strategy construction with a ValueError, not at runtime in `ask()` with a NotImplementedError. Earlier signal, discoverable in the data model. Adds three validator tests: NChooseK + LSR-BO rejection (regression on the existing branch), semi-continuous + LSR-BO rejection (new branch), and plain-continuous + LSR-BO passes (the supported case).

jduerholt

Nothing big, small improvements.

jduerholt · 2026-03-27T11:50:08Z

+
+        # Precompute linear constraints and bounds for local optimize_acqf
+        if use_qp:
+            from bofire.strategies import utils


Why is this import here as lazy import?

jduerholt · 2026-03-27T13:33:04Z

+            fixed_features = {idx: 0.0}
+            batch_initial = projected.unsqueeze(0).unsqueeze(0)  # (1, 1, d)
+
+            try:


We could massively speed this up, using bach optimization, or? We can bundle every pruning candidates from the QP projections into a batch of initial candiation and optimize over them in parallel.

jduerholt · 2026-05-12T10:19:37Z

We need to adapt the docstring here regarding the nchoosek as we are now applying it globally.

jduerholt · 2026-05-12T10:23:49Z

                For more details, it is referred to https://www.merl.com/publications/docs/TR2023-057.pdf. Defaults to None.
+            relax_allow_zero (bool, optional): If True, semi-continuous continuous inputs
+                (`allow_zero=True` with positive lower bound) report a relaxed lower bound of 0.
+                Other input types ignore this flag. Defaults to False.


Is this a good idea to propagate it everywhere? Could we not have this via kwargs or so to not have to introduce this relax_allow_zero flag everywhere?

jduerholt · 2026-05-12T10:32:46Z

        if reference_value is not None and values is not None:
            raise ValueError("Only one can be used, `local_value` or `values`.")

+        # Effective lower bound: 0 for semi-continuous features when the


This should be tested, there are also tests that test local bounds, we should do the test for the effective lower.

jduerholt · 2026-05-12T12:49:59Z

            n_iters += 1
        return pd.concat(valid_samples, ignore_index=True).iloc[:candidate_count]

+    @staticmethod


Let us remove this from the current PR, it is bloating it and just rely on the try except around the polytope sampler, which is exactly doing the same but implicitly using botorch. What do you think?

jduerholt · 2026-05-12T13:01:16Z

+def get_torch_bounds_from_domain(
+    domain: Domain,
+    input_preprocessing_specs: InputTransformSpecs,
+    relax_allow_zero: bool = False,


here we could also build in the local bounds.

jduerholt · 2026-05-12T13:02:05Z

    return constraints


 def get_nonlinear_constraints(


Why this strange API here? with include and exclude_nchoosek that overrides the include?

jduerholt · 2026-05-12T13:02:25Z

        "\n## Objectives\n- y: Maximize\n"
        "\n## Constraints (candidates MUST satisfy all of these)\n"
        "- 1.0*x1 + 1.0*x2 <= 1.5 — Budget constraint"
    )


delete this comment

jduerholt · 2026-05-12T13:03:19Z

@@ -0,0 +1,367 @@
+"""Tests for the domain-level pruning applicability gates.


we are testing here sutff from the pruining module, should be in the same as the rest of test regarding this module

Polish, structural cleanups, and small API improvements from the review pass. Touches the pruning module, the BotorchOptimizer, the RandomStrategy, the feature data models, and the corresponding tests. Module changes: - ContinuousInput.is_semicontinuous property; used everywhere the `allow_zero and bounds[0] > 0` check was open-coded. - Feature.get_bounds: `relax_allow_zero` moves into `**kwargs`; only ContinuousInput declares it explicitly. Other subclasses drop the meaningless argument. - get_torch_bounds_from_domain: new `reference_experiment` kwarg for local bounds (LSR-BO); BotorchOptimizer._optimize uses it instead of inlined get_bounds. - Inputs.get_number_of_categorical_combinations: new `include_semicontinuous` flag; BotorchOptimizer._determine_optimizer passes False when pruning is applicable (replaces the inlined divide-out logic). - get_nonlinear_constraints: `exclude_nchoosek` flag dropped -- callers pass `includes=[ProductInequalityConstraint, ...]` to exclude NChooseK instead. - _nchoosek_pruning: - Dataclasses (ActionKind, Action, PruningContext, PruningState) docstrings rewritten with explicit attribute sections. - _violated_constraints renamed to _max_count_violated_constraints for symmetry with _min_count_violated_constraints. - Legacy variant-builder wrappers (_build_zero_variant, _build_active_variant, _build_activate_variant) deleted; tests migrated to _build_variant(kind=...). - Removed defensive multi-column NChooseK guard (already enforced by BoFire validators) and the q==0 short-circuit (loop is already a no-op). - TODO comment in _local_optacqf referencing batch-optimization perf win (Ax PR #5180-style bundling of per-variant optimize_acqf calls). - BotorchOptimizer: - _prune_if_applicable renamed to _prune. - Pinning policy widened: pin every column not in `{continuous, un-fixed, NChooseK / semi-continuous / linear- constraint-touching} \ {Interpoint / Nonlinear / Product constraint members}`. - RandomStrategy: removed _subset_lp_feasible LP rejection. The polytope sampler's interior-point solver is the authoritative feasibility check via the existing InfeasibilityError try/except. - benchmarks/benchmark.py: docstring updated for global NChooseK application. Test changes: - Added test_continuous_input_is_semicontinuous and test_continuous_input_feature_get_bounds_relax_allow_zero in test_continuous.py. - Merged tests/bofire/strategies/test_pruning_gate.py (367 lines, 3 classes) into test_nchoosek_pruning.py; old file deleted. - Removed stale comment in test_domain.py pointing at the now-merged test_pruning_gate.py. - Migrated 11 _build_zero_variant / _build_active_variant / _build_activate_variant call sites to _build_variant(kind=...). 477 targeted tests pass. Full-tree sweep matches baseline (4 pre- existing failures unchanged; +618 collected tests).

The semi-continuous enumeration in get_number_of_categorical_combinations and get_categorical_combinations was checking `allow_zero and not is_fixed()` -- this works in practice because the ContinuousInput validator forbids `allow_zero=True with bounds[0]==0`, but the intent is clearer when expressed via the is_semicontinuous property. Same defensive principle as the pinning policy: a feature is "conditional" (gets the 2x active/inactive factor in the combination count) only when its feasible region is the disconnected union `{0} ∪ [lb, ub]` -- i.e., when `is_semicontinuous` is True. The explicit `bounds[0] > 0` check inside the property guards against any future validator relaxation that would allow allow_zero with zero lower bound.

jduerholt and others added 2 commits March 26, 2026 17:13

jduerholt marked this pull request as draft March 26, 2026 21:44

jduerholt added 13 commits May 5, 2026 09:43

merge

d528832

current state

b82d962

add comment

81e17b5

fix bug

6426400

activate implementation

a2601b2

fix dense af stuff

dedf1ef

proper x pending

334a3fb

fix categorical/discrete features for pruning

8906fa0

jduerholt commented May 12, 2026

View reviewed changes

jduerholt added 4 commits May 12, 2026 16:26

fix some bugs

abb9457

fix it

5162d96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BONSAI-style greedy pruning for NChooseK constraints#747

Add BONSAI-style greedy pruning for NChooseK constraints#747
jduerholt wants to merge 19 commits into
mainfrom
feature/bonsai

jduerholt commented Mar 26, 2026

Uh oh!

jduerholt commented Mar 26, 2026

Uh oh!

jduerholt left a comment

Uh oh!

jduerholt Mar 27, 2026

Uh oh!

jduerholt Mar 27, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

jduerholt May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,367 @@
		"""Tests for the domain-level pruning applicability gates.

Conversation

jduerholt commented Mar 26, 2026

Motivation

Have you read the Contributing Guidelines on pull requests?

Have you updated CHANGELOG.md?

Test Plan

Uh oh!

jduerholt commented Mar 26, 2026

Uh oh!

jduerholt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Have you updated `CHANGELOG.md`?